In this exercise, we will be using functions from the
tidyverse package. You can see we’ve added the chunk option
message = FALSE to hide the version information that
tidyverse normally displays.
library(tidyverse)
Load
Datasaurus.csv. This file contains a few different datasets, which are indicated by the columndataset, each of which contain points withxandycoordinates.Use the
group_by()andsummarise()functions to calculate the means and standard deviations of thexandyvariables, grouped by thedatasetcolumn. You could also include the correlation betweenxandyin your summary, using the functioncor(x, y).Then make a scatter plot of x vs y, faceted by the
datasetcolumn.
datasaurus <- read_csv("Datasaurus.csv")
datasaurus %>%
group_by(dataset) %>%
summarise(across(c(x, y),
list(mean = ~mean(.),
sd = ~sd(.))),
corr = cor(x, y)) %>%
ungroup()
# A tibble: 13 × 6
dataset x_mean x_sd y_mean y_sd corr
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 away 54.3 16.8 47.8 26.9 -0.0641
2 bullseye 54.3 16.8 47.8 26.9 -0.0686
3 circle 54.3 16.8 47.8 26.9 -0.0683
4 dino 54.3 16.8 47.8 26.9 -0.0645
5 dots 54.3 16.8 47.8 26.9 -0.0603
6 h_lines 54.3 16.8 47.8 26.9 -0.0617
7 high_lines 54.3 16.8 47.8 26.9 -0.0685
8 slant_down 54.3 16.8 47.8 26.9 -0.0690
9 slant_up 54.3 16.8 47.8 26.9 -0.0686
10 star 54.3 16.8 47.8 26.9 -0.0630
11 v_lines 54.3 16.8 47.8 26.9 -0.0694
12 wide_lines 54.3 16.8 47.8 26.9 -0.0666
13 x_shape 54.3 16.8 47.8 26.9 -0.0656
ggplot(datasaurus, aes(x = x, y = y)) +
geom_point() +
facet_wrap(vars(dataset), ncol = 5)
We’ve seen the data in
pig_behaviour_by_time.csvin the lectures.Use
pivot_longer()to convert the seven pig behaviours (in the columnsUprightthroughNosing_pen) into a factorBehaviourand measurement in variableNumber(representing the number of pigs engaging in a particular behaviour).Use
mutate()to derive a variableProportion, representing the proportion of pigs engaging in a particular behaviour, by dividing by theTotal_pigsvariable.Store this data frame in a variable, and then display it using
glimpse().
pig_behaviour_by_time <- read_csv("pig_behaviour_by_time.csv")
pig_behaviour_longer <- pig_behaviour_by_time %>%
pivot_longer(Upright:Nosing_pen,
names_to = "Behaviour",
values_to = "Number") %>%
mutate(Proportion = Number / Total_pigs)
glimpse(pig_behaviour_longer)
Rows: 1,960
Columns: 12
$ Pen <dbl> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3,…
$ Housing <chr> "FC", "FC", "FC", "FC", "FC", "FC", "FC", "FC", "FC", "FC",…
$ Treatment <chr> "HC", "HC", "HC", "HC", "HC", "HC", "HC", "HC", "HC", "HC",…
$ HousTreat <chr> "FC, HC", "FC, HC", "FC, HC", "FC, HC", "FC, HC", "FC, HC",…
$ Sex <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M",…
$ Total_pigs <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
$ Time <chr> "15 mins", "15 mins", "15 mins", "15 mins", "15 mins", "15 …
$ Time_brief <chr> "15m", "15m", "15m", "15m", "15m", "15m", "15m", "15m", "15…
$ Time_hours <dbl> 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,…
$ Behaviour <chr> "Upright", "Aggression", "Chewing_pig", "Playing", "Vocalis…
$ Number <dbl> 10, 0, 2, 0, 0, 8, 0, 10, 0, 0, 1, 0, 10, 0, 8, 1, 3, 0, 0,…
$ Proportion <dbl> 1.0, 0.0, 0.2, 0.0, 0.0, 0.8, 0.0, 1.0, 0.0, 0.0, 0.1, 0.0,…
Use
group_by()andsummarise()to calculate the mean proportion of pigs engaging in each behaviour, for each level ofHousingandTreatment. (You will need to usena.rm = TRUE, as there is missing data in this dataset.)Store this data frame in a variable, and then display it using
glimpse().
pig_behaviour_means <- pig_behaviour_longer %>%
group_by(Behaviour, Housing, Treatment) %>%
summarise(Proportion = mean(Proportion, na.rm = TRUE)) %>%
ungroup()
`summarise()` has grouped output by 'Behaviour', 'Housing'. You can override
using the `.groups` argument.
glimpse(pig_behaviour_means)
Rows: 28
Columns: 4
$ Behaviour <chr> "Aggression", "Aggression", "Aggression", "Aggression", "Ch…
$ Housing <chr> "FC", "FC", "PS", "PS", "FC", "FC", "PS", "PS", "FC", "FC",…
$ Treatment <chr> "C", "HC", "C", "HC", "C", "HC", "C", "HC", "C", "HC", "C",…
$ Proportion <dbl> 0.02500000, 0.01529412, 0.01428571, 0.03396226, 0.20833333,…
Use
pivot_wider()on the summary data frame to make one column for each treatment combination (i.e., each combination of Housing and Treatment).Hint:
pivot_wider()allows you to provide more than one variable fornames_from; e.g.names_from = c(column1, column2).
pig_behaviour_means %>%
pivot_wider(names_from = c(Housing, Treatment),
values_from = Proportion)
# A tibble: 7 × 5
Behaviour FC_C FC_HC PS_C PS_HC
<chr> <dbl> <dbl> <dbl> <dbl>
1 Aggression 0.025 0.0153 0.0143 0.0340
2 Chewing_pig 0.208 0.178 0.23 0.238
3 Exploring_pen 0.33 0.336 0.436 0.379
4 Nosing_pen 0.035 0.0471 0.0829 0.0566
5 Playing 0.0217 0.0188 0.0257 0.0113
6 Upright 0.527 0.601 0.724 0.723
7 Vocalising 0.115 0.107 0.141 0.166
© 2021 Statistical Consulting Centre, The University of Melbourne.